library(ggplot2)library(tidyverse)library(plotly)dxy <-read.csv("./data/USD.csv",header =TRUE)dxy$Date <-as.Date(dxy$Date, format ="%m/%d/%Y")dxy <- dxy[order(dxy$Date),]gg <-ggplot(data = dxy, aes(x = Date, y = Price)) +geom_line(color='purple') +labs(x ="Year", title ="Trend of U.S. Dollar Index")plotly_gg <-ggplotly(gg)plotly_gg
The time series plot of the U.S. Dollar Index shows an upward trend and quarterly seasonality. However, it does not exhibit clearly periodic fluctuations across years. I preliminarily consider it to follow an multiplicative model.
Code
bea <-read.csv("./data/bea.csv")bea$time <-as.Date(bea$time)gg <-ggplot(bea) +geom_line(aes(x = time, y = balance/1000), color ="purple") +labs(x ="Year", y ="Billions of Dollars", title ="Trends of Trade Balance")plotly_gg <-ggplotly(gg)plotly_gg
The time series plot of the trade balance shows a strong seasonality, with July being the lowest month each year and January being the highest. This pattern may be due to factors such as seasonal fluctuations in demand and supply, where imports tend to increase during the holiday season in December, leading to a higher trade deficit in January. On the other hand, July may see a dip in trade activity due to summer holidays, reduced production, and lower consumer demand.
Code
gdp <-read.csv("./data/gdp.csv")gdp$time <-as.Date(gdp$time)gdp$total <- gdp$consumption + gdp$investment + gdp$net_export + gdp$governmentgg <-ggplot(data = gdp, aes(x = time, y = total)) +geom_line(color='purple') +labs(x ="Year", y ="Billions of Dollars", title ="GDP Over Time")plotly_gg <-ggplotly(gg)plotly_gg
The time series plot of GDP shows a steady increasing trend. Since the raw data we extracted from the BEA is seasonally adjusted at annual rates, there is no noticeable seasonality in the data. Given this, I preliminarily consider the time series to follow an additive model, where the trend is consistent over time and the variations are not influenced by seasonal patterns. This additive model helps capture the general growth trajectory without the need for seasonal adjustments.
Code
data_unem <-read.csv("./data/unem.csv", header=TRUE)data_unem$time <-as.Date(data_unem$time)gg <-ggplot(data = data_unem, aes(x = time, y = unem)) +geom_line(color='purple') +labs(x ="Year", y ="Rate", title ="U.S. Unemployment Rate Over Time")plotly_gg <-ggplotly(gg)plotly_gg
The unemployment rate increased significantly, showing both the upwarding trend and seasonality. It may need to take log-transformation in the further analysis.
Code
data_cpi <-read.csv("./data/cpi.csv", header=TRUE)data_cpi$time <-as.Date(data_cpi$time)gg <-ggplot(data = data_cpi, aes(x = time, y = cpi)) +geom_line(color='purple') +labs(x ="Year", y ="Index", title ="U.S. CPI Over Time")plotly_gg <-ggplotly(gg)plotly_gg
The Consumer Price Index (CPI) also shows an upward trend and seasonality. I preliminarily consider it to follow an additive model.
Code
library(quantmod)getSymbols("^GSPC", src ="yahoo", from ="2005-01-01", to ="2024-12-31")
[1] "GSPC"
Code
data <-data.frame(Date =index(GSPC), Open = GSPC[, "GSPC.Open"], High = GSPC[, "GSPC.High"], Low = GSPC[, "GSPC.Low"], Close = GSPC[, "GSPC.Close"])colnames(data) <-c("Date", "Open", "High", "Low", "Close")figc <- data %>%plot_ly(x =~Date, type ="candlestick",open =~Open, close =~Close,high =~High, low =~Low)figc <- figc %>%layout(title ="S&P 500 Index Candlestick Plot",xaxis =list(type ="date", title ="Date"),yaxis =list(title ="Index Price"))figc
The S&P 500 index shows a overall upward trend. However, it does not exhibit clearly seasonality or periodic fluctuations across years. I preliminarily consider it to follow an additive model.
Code
xau <-read.csv("./data/xau.csv")xau$Date <-as.Date(xau$Date)ggplot(data = xau, aes(x = Date, y = Price)) +geom_line(color='purple') +labs(x ="Year", y ="Dollar ($)", title ="Spot Gold in US Dollar")
The plot shows an upward trend in spot gold prices, with annual seasonality. I preliminarily consider it to follow a multiplicative model.
Code
library(readxl)gsci <-read.csv("./data/gsci.csv")gsci$Date <-as.Date(gsci$Date)ggplot(data = gsci, aes(x = Date, y = Price)) +geom_line(color='purple') +labs(x ="Year", y ="Dollar ($)", title ="S&P GSCI Index (USD) Over Time")
The plot shows a stable trend from 2015 to 2020, an increase from 2020 to 2022, and a decrease from 2022 to 2025. There is an annual seasonality, but no obvious cycle. I preliminarily consider it to follow a multiplicative model.
Code
house <-read.csv("./data/house.csv", header=TRUE)house$time <-as.Date(house$time)gg <-ggplot(data = house, aes(x = time, y = index)) +geom_line(color='purple') +labs(x ="Year", y ="Index", title ="House Price Index Over Time")plotly_gg <-ggplotly(gg)plotly_gg
The House Price Index shows an overall upward trend. However, it does not exhibit clearly periodic fluctuations across years. I preliminarily consider it to follow an multiplicative model.
Code
visitors <-read.csv("./data/visitors.csv", header=TRUE)visitors$time <-as.Date(visitors$time)gg <-ggplot(data = visitors, aes(x = time, y = count)) +geom_line(color='purple') +labs(x ="Year", y ="Number of Visitors", title ="Non-U.S. Resident Visitor Arrivals to the U.S.")plotly_gg <-ggplotly(gg)plotly_gg
The data on visitor arrivals to the United States shows a clear seasonal pattern, with a stable, gradual increase from 2005 to 2020. I preliminarily consider it to follow an additive model.
2. Lag plot
Based on the time series plots above, we apply log transformation to the following variables: USD Index, Unemployment Rate, Gold Price, Global Commodity Price, and Number of International Visitors. This transformation is used to stabilize variance, reduce skewness, and better model the growth patterns in these variables. Taking the logarithm helps to linearize exponential growth trends, especially for variables like Gold Price and Commodity Prices, which often exhibit high volatility and nonlinear behavior.
From the lag plot, we can see that the U.S. Dollar Index has a strong positive autocorrelation. Even at lag 9, it still shows a high autocorrelation, indicating that past values have a lasting impact on current values. This suggests a persistent trend and momentum in the time series data.
From the lag plot, we can see that the Exports data has a strong positive autocorrelation in the first three lags, which then gradually weakens until it becomes very weak in lags 7-9. Moreover, since data points from different quarters are clustered together, there is no clear seasonal pattern in the serial correlation. This suggests that while past values influence current values in the short term, the effect diminishes over time.
From the lag plot, we can see that the GDP data has a strong positive autocorrelation in the first two lags, which then gradually weakens until it becomes very weak in lags 6-9. Moreover, since data points from different quarters are clustered together, there is no clear seasonal pattern in the serial correlation. This suggests that while past values influence future values in the short term, the effect diminishes over time.
From the lag plot, we can see that the Spot Gold Price has a strong positive autocorrelation. Even at lag 16, it still shows a high autocorrelation, indicating that past values have a lasting impact on current values.
From the lag plot, we can see that the S&P GSCI Index has a strong positive autocorrelation. Even at lag 16, it still shows a high autocorrelation, indicating that past values have a lasting impact on current values.
library(gridExtra)acf <-ggAcf(dxy_ts)+ggtitle("ACF Plot for USD Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(dxy_ts)+ggtitle("PACF Plot for USD Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
The ACF plot shows a very slow decay over time, meaning autocorrelation remains high for many lags and decays very slowly towards zero. The PACF plot shows a sharp drop at lag 1, and then the values will rapidly approach zero. Only the first lag is significant, and the rest are close to zero. This indicates that the U.S. Dollar Index time series is non-stationary, and it may be unit root non-stationary.
Code
acf <-ggAcf(balance_ts)+ggtitle("ACF Plot for Trade Balance") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(balance_ts)+ggtitle("PACF Plot for Trade Balance") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Similar to the U.S. Dollar Index time series, the ACF plot for the exports data shows a very slow decay over time, while the PACF plot drops sharply at lag 1 and then rapidly approaches zero. However, the decay in the ACF plot is slightly faster than that of a typical unit root series. This suggests that the non-stationarity in the exports time series is more likely due to a trend rather than a unit root, meaning it is trend stationary.
Code
acf <-ggAcf(gdp_ts)+ggtitle("ACF Plot for GDP") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(gdp_ts)+ggtitle("PACF Plot for GDP") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
The ACF plot shows a very slow decay over time, meaning autocorrelation remains high for many lags and decays very slowly towards zero. The PACF plot shows a sharp drop at lag 1, and then the values will rapidly approach zero. Only the first lag is significant, and the rest are close to zero. This indicates that the GDP time series is non-stationary, and it may be unit root non-stationary.
Code
acf <-ggAcf(unem_ts)+ggtitle("ACF Plot for Unemployment Rate") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(unem_ts)+ggtitle("PACF Plot for Unemployment Rate") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(cpi_ts)+ggtitle("ACF Plot for CPI") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(cpi_ts)+ggtitle("PACF Plot for CPI") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(sp5_ts)+ggtitle("ACF Plot for S&P 500 Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(sp5_ts)+ggtitle("PACF Plot for S&P 500 Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(xau_ts)+ggtitle("ACF Plot for Gold Price") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(xau_ts)+ggtitle("PACF Plot for Gold Price") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
The ACF plot shows a very slow decay over time, meaning autocorrelation remains high for many lags and decays very slowly towards zero. The PACF plot shows a sharp drop at lag 1, and then the values will rapidly approach zero. Only the first lag is significant, and the rest are close to zero. This indicates that the Spot Gold Price time series is non-stationary, and it may be unit root non-stationary.
Code
acf <-ggAcf(gsci_ts)+ggtitle("ACF Plot for S&P GSCI Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(gsci_ts)+ggtitle("PACF Plot for S&P GSCI Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
The ACF plot shows a very slow decay over time, meaning autocorrelation remains high for many lags and decays very slowly towards zero. The PACF plot shows a sharp drop at lag 1, and then the values will rapidly approach zero. Only the first lag is significant, and the rest are close to zero. This indicates that the S&P GSCI Index is non-stationary, and it may be unit root non-stationary.
Code
acf <-ggAcf(house_ts)+ggtitle("ACF Plot for House Price Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(house_ts)+ggtitle("PACF Plot for House Price Index") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Code
acf <-ggAcf(visitors_ts)+ggtitle("ACF Plot for Number of Visitors") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") pacf <-ggPacf(visitors_ts)+ggtitle("PACF Plot for Number of Visitors") +theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(acf, pacf, nrow=2)
Augmented Dickey-Fuller Test
data: dxy_ts
Dickey-Fuller = -2.6215, Lag order = 10, p-value = 0.3152
alternative hypothesis: stationary
The p-value in the ADF test for the U.S. Dollar Index is 0.31. It indicates that we fail to reject the null hypothesis at the 10% significance level. This means there is insufficient evidence to conclude that the U.S. Dollar Index is stationary. Therefore, it is likely that the U.S. Dollar Index contains a unit root and is non-stationary.
Code
adf.test(balance_ts)
Augmented Dickey-Fuller Test
data: balance_ts
Dickey-Fuller = -2.1425, Lag order = 4, p-value = 0.5174
alternative hypothesis: stationary
The p-value in the ADF test for the exports data is 0.12. This indicates that we fail to reject the null hypothesis at the 10% significance level, but there is still some degree of significance. Considering the conclusions from Part 4 with the ACF and PACF plots, it is possible that the ADF test is detecting non-stationarity due to a unit root. However, the exports time series may actually be trend stationary, meaning it has a trend rather than a unit root, which the ADF test cannot detect. As a result, the test fails to reject the null hypothesis, but the marginal significance suggests that the series might still exhibit some form of non-stationarity that is related to a trend rather than a unit root.
Code
adf.test(gdp_ts)
Augmented Dickey-Fuller Test
data: gdp_ts
Dickey-Fuller = -0.0084786, Lag order = 4, p-value = 0.99
alternative hypothesis: stationary
The p-value in the ADF test for the GDP is 0.99. This means there is insufficient evidence to conclude that the GDP is stationary at the 10% significant level. Therefore, it is likely that the time series contains a unit root and is non-stationary.
Code
adf.test(unem_ts)
Augmented Dickey-Fuller Test
data: unem_ts
Dickey-Fuller = -2.2088, Lag order = 6, p-value = 0.4882
alternative hypothesis: stationary
Code
adf.test(cpi_ts)
Augmented Dickey-Fuller Test
data: cpi_ts
Dickey-Fuller = 0.39171, Lag order = 6, p-value = 0.99
alternative hypothesis: stationary
Code
adf.test(sp5_ts)
Augmented Dickey-Fuller Test
data: sp5_ts
Dickey-Fuller = -0.67242, Lag order = 17, p-value = 0.973
alternative hypothesis: stationary
Code
adf.test(xau_ts)
Augmented Dickey-Fuller Test
data: xau_ts
Dickey-Fuller = -2.186, Lag order = 10, p-value = 0.4996
alternative hypothesis: stationary
The p-value in the ADF test for the Spot Gold Price is 0.93. This means there is insufficient evidence to conclude that it is stationary at the 10% significant level. Therefore, it is likely that the time series contains a unit root and is non-stationary.
Code
adf.test(gsci_ts)
Augmented Dickey-Fuller Test
data: gsci_ts
Dickey-Fuller = -2.362, Lag order = 13, p-value = 0.4251
alternative hypothesis: stationary
The p-value in the ADF test for the S&P GSCI Index is 0.48. This means there is insufficient evidence to conclude that the index is stationary at the 10% significant level. Therefore, it is likely that the time series contains a unit root and is non-stationary.
Code
adf.test(house_ts)
Augmented Dickey-Fuller Test
data: house_ts
Dickey-Fuller = -1.3735, Lag order = 4, p-value = 0.832
alternative hypothesis: stationary
Code
adf.test(visitors_ts)
Augmented Dickey-Fuller Test
data: visitors_ts
Dickey-Fuller = -2.4163, Lag order = 6, p-value = 0.4008
alternative hypothesis: stationary
The p-values in the ADF (Augmented Dickey-Fuller) tests for all variables are above 0.05, which indicates that we fail to reject the null hypothesis of the test. This suggests that the time series for all variables are non-stationary, as their statistical properties, such as mean and variance, change over time. To make the time series stationary, we need to apply differencing, which helps to remove trends and make the data more stable over time, ensuring that the statistical properties are constant. Therefore, differencing is required to achieve stationarity for further time series analysis.
diff1 <-ggAcf(diff(dxy_ts), 50, main="ACF of First Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") diff2 <-ggAcf(diff(dxy_ts, 2), 50, main="ACF of Second Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(diff1, diff2, nrow=2)
Code
diff1 <-ggAcf(diff(balance_ts), 50, main="ACF of First Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") diff2 <-ggAcf(diff(balance_ts, 2), 50, main="ACF of Second Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(diff1, diff2, nrow=2)
Code
diff1 <-ggAcf(diff(gdp_ts), 50, main="ACF of First Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") diff2 <-ggAcf(diff(gdp_ts, 2), 50, main="ACF of Second Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(diff1, diff2, nrow=2)
Code
diff1 <-ggAcf(diff(unem_ts), 50, main="ACF of First Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") diff2 <-ggAcf(diff(unem_ts, 2), 50, main="ACF of Second Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(diff1, diff2, nrow=2)
Code
diff1 <-ggAcf(diff(cpi_ts), 50, main="ACF of First Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") diff2 <-ggAcf(diff(cpi_ts, 2), 50, main="ACF of Second Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(diff1, diff2, nrow=2)
Code
diff1 <-ggAcf(diff(sp5_ts), 50, main="ACF of First Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") diff2 <-ggAcf(diff(sp5_ts, 2), 50, main="ACF of Second Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(diff1, diff2, nrow=2)
Code
diff1 <-ggAcf(diff(xau_ts), 50, main="ACF of First Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") diff2 <-ggAcf(diff(xau_ts, 2), 50, main="ACF of Second Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(diff1, diff2, nrow=2)
Code
diff1 <-ggAcf(diff(gsci_ts), 50, main="ACF of First Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") diff2 <-ggAcf(diff(gsci_ts, 2), 50, main="ACF of Second Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(diff1, diff2, nrow=2)
Code
diff1 <-ggAcf(diff(house_ts), 50, main="ACF of First Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") diff2 <-ggAcf(diff(house_ts, 2), 50, main="ACF of Second Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(diff1, diff2, nrow=2)
Code
diff1 <-ggAcf(diff(visitors_ts), 50, main="ACF of First Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") diff2 <-ggAcf(diff(visitors_ts, 2), 50, main="ACF of Second Differencing")+theme_bw()+geom_segment(lineend ="butt", color ="#5a3196") +geom_hline(yintercept =0, color ="#5a3196") grid.arrange(diff1, diff2, nrow=2)